Investigating Diatopic Variation in a Historical Corpus
نویسندگان
چکیده
This paper investigates diatopic variation in a historical corpus of German. Based on equivalent word forms from different language areas, replacement rules and mappings are derived which describe the relations between these word forms. These rules and mappings are then interpreted as reflections of morphological, phonological or graphemic variation. Based on sample rules and mappings, we show that our approach can replicate results from historical linguistics. While previous studies were restricted to predefined word lists, or confined to single authors or texts, our approach uses a much wider range of data available in historical corpora.
منابع مشابه
Mapping Diatopic and Diachronic Variation in Spoken Czech: the Ortofon and Dialekt Corpora
ORTOFON and DIALEKT are two corpora of spoken Czech (recordings + transcripts) which are currently being built at the Institute of the Czech National Corpus. The first one (ORTOFON) continues the tradition of the CNC’s ORAL series of spoken corpora by focusing on collecting recordings of unscripted informal spoken interactions (“prototypically spoken texts”), but also provides new features, mos...
متن کاملCrowdsourcing Dialect Characterization through Twitter
We perform a large-scale analysis of language diatopic variation using geotagged microblogging datasets. By collecting all Twitter messages written in Spanish over more than two years, we build a corpus from which a carefully selected list of concepts allows us to characterize Spanish varieties on a global scale. A cluster analysis proves the existence of well defined macroregions sharing commo...
متن کاملVOLIP: a corpus of spoken Italian and a virtuous example of reuse of linguistic resources
The corpus VoLIP (The Voice of LIP) is an Italian speech resource which associates the audio signals to the orthographic transcriptions of the LIP Corpus. The LIP Corpus was designed to represent diaphasic, diatopic and diamesic variation. The Corpus was collected in the early ‘90s to compile a frequency lexicon of spoken Italian and its size was tailored to produce a reliable frequency lexicon...
متن کاملChoices over time : methodological issues in investigating current change 1
The fact that English is changing is immediately apparent to a modern reader of, say, 18th or 19th century literature, or indeed to a teenager speaking to an elderly relative. However, as Mair (2006) points out, anecdotal evidence for linguistic change is unreliable. The systematic study of language change requires large, evenly balanced, and reliably annotated corpora with texts sampled over a...
متن کاملPitch and duration in RP: A corpus-based historical exploration
As a reference variety, Received Pronunciation (RP), a non-regional accent of British English, is extensively described both in terms of synchronic variation and its historical development. There is a long tradition of describing changes over time in the traditional phonemic framework. However, modern corpus-based acoustic investigations have not been attempted on material older than the 1950s ...
متن کامل